KMID : 1022420210130030065
|
|
Phonetics and Speech Sciences 2021 Volume.13 No. 3 p.65 ~ p.70
|
|
Designing a large recording script for open-domain English speech synthesis
|
|
Kim Sun-Hee
Kim Ho-Jeong Lee Yoo-Seop Kim ?Bo-Ryoung Won Yong-Kook Kim Bong-Wan
|
|
Abstract
|
|
|
This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.
|
|
KEYWORD
|
|
recording script, speech synthesis, English, word coverage, phonetic coverage, readability
|
|
FullTexts / Linksout information
|
|
|
|
Listed journal information
|
|
|
|